xen.git
11 years agox86/time: always count s_time from Xen boot
Tim Deegan [Mon, 10 Mar 2014 10:18:49 +0000 (11:18 +0100)]
x86/time: always count s_time from Xen boot

Timestamped printks() can call NOW() before init_xen_time().
Set a baseline TSC as soon as we've calibrated the TSC rate,
so that NOW() consistently counts from boot time.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/schedule: remove noreturn from schedule_tail() function pointer
Andrew Cooper [Mon, 10 Mar 2014 10:18:05 +0000 (11:18 +0100)]
x86/schedule: remove noreturn from schedule_tail() function pointer

XenServer has recently had a support case where this bugframe in
context_switch() was hit, presumably from a corrupt function pointer as the
vcpu pointer was fine.

On balance, it is better to leave the bugframe around for peace of mind in
exceptional circumstances, than to use the optimisations provided by noreturn.

At any meaningful levels of optimisation, the noreturn causes the bugframe to
be optimised out, meaning that any exceptional returns fall into unlikely
branches, which will result in very weird behaviour.

The unreachable() in BUG() does the useful part of noreturn for us, allowing
the compiler not to mess about restoring stack frames etc, but causes a ud2
instruction to be present.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/mwait_idle: support Intel Atom Processor C2000 product family
Len Brown [Mon, 10 Mar 2014 10:14:25 +0000 (11:14 +0100)]
x86/mwait_idle: support Intel Atom Processor C2000 product family

Support the "Intel(R) Atom(TM) Processor C2000 Product Family",
formerly code-named Avoton.  It is based on the next generation
Intel Atom processor architecture, formerly code-named Silvermont.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/MCE: mctelem_init() cleanup
Jan Beulich [Mon, 10 Mar 2014 10:12:30 +0000 (11:12 +0100)]
x86/MCE: mctelem_init() cleanup

The function can be __init with its caller taking care of only calling
it on the BSP. And with that all its static variables can be dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
11 years agokexec: identify which cpu the kexec image is being executed on
Andrew Cooper [Mon, 10 Mar 2014 10:11:28 +0000 (11:11 +0100)]
kexec: identify which cpu the kexec image is being executed on

A patch to this effect has been in XenServer for a little while, and has
proved to be a useful debugging point for servers which have different
behaviours depending when crashing on the non-bootstrap processor.

Moving the printk() from kexec_panic() to one_cpu_only() means that it will
only be printed for the cpu which wins the race along the kexec path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
11 years agox86/HVM: adjust data definitions in mtrr.c
Jan Beulich [Mon, 10 Mar 2014 10:06:40 +0000 (11:06 +0100)]
x86/HVM: adjust data definitions in mtrr.c

- use proper section attributes
- use initializers where possible
- clean up pat_type_2_pte_flags()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: use manifest constants / enumerators for memory types
Jan Beulich [Mon, 10 Mar 2014 10:05:51 +0000 (11:05 +0100)]
x86/HVM: use manifest constants / enumerators for memory types

... instead of literal numbers, thus making it possible for the reader
to understand the code without having to look up the meaning of these
numbers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: consolidate passthrough handling in epte_get_entry_emt()
Jan Beulich [Mon, 10 Mar 2014 10:04:36 +0000 (11:04 +0100)]
x86/HVM: consolidate passthrough handling in epte_get_entry_emt()

It is inconsistent to depend on iommu_enabled alone: For a guest
without devices passed through to it, it is of no concern whether the
IOMMU is enabled.

There's one rather special case to take care of: VMX code marks the
LAPIC access page as MMIO. The added assertion needs to take this into
consideration, and the subsequent handling of the direct MMIO case was
inconsistent too: That page would have been WB in the absence of an
IOMMU, but UC in the presence of it, while in fact the cachabilty of
this page is entirely unrelated to an IOMMU being in use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/HVM: fix memory type merging in epte_get_entry_emt()
Jan Beulich [Mon, 10 Mar 2014 10:03:53 +0000 (11:03 +0100)]
x86/HVM: fix memory type merging in epte_get_entry_emt()

Using the minimum numeric value of guest and host specified memory
types is too simplistic - it works only correctly for a subset of
types. It is in particular the WT/WP combination that needs conversion
to UC if the two types conflict.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/hvm: refine the judgment on IDENT_PT for EMT
Dongxiao Xu [Mon, 10 Mar 2014 10:02:25 +0000 (11:02 +0100)]
x86/hvm: refine the judgment on IDENT_PT for EMT

When trying to get the EPT EMT type, the judgment on
HVM_PARAM_IDENT_PT is not correct which always returns WB type if
the parameter is not set. Remove the related code.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
We can't fully drop the dependency yet, but we should certainly avoid
overriding cases already properly handled. The reason for this is that
the guest setting up its MTRRs happens _after_ the EPT tables got
already constructed, and no code is in place to propagate this to the
EPT code. Without this check we're forcing the guest to run with all of
its memory uncachable until something happens to re-write every single
EPT entry. But of course this has to be just a temporary solution.

In the same spirit we should defer the "very early" (when the guest is
still being constructed and has no vCPU yet) override to the last
possible point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/shadow: add a clarifying assertion
Jan Beulich [Thu, 6 Mar 2014 11:32:48 +0000 (11:32 +0000)]
x86/shadow: add a clarifying assertion

... documenting that we don't have to worry about merging guest
provided flags with the ones we want to enforce ourselves.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoRevert "credit: change default timeslice to 5ms"
Jan Beulich [Thu, 6 Mar 2014 12:44:42 +0000 (13:44 +0100)]
Revert "credit: change default timeslice to 5ms"

This reverts commit 348dee3b8afb72cb4713d2e6600b4e86e0cc1723
(retracted by author/maintainer).

11 years agotmem: drop a gross goto usage
Konrad Rzeszutek Wilk [Thu, 6 Mar 2014 11:23:25 +0000 (12:23 +0100)]
tmem: drop a gross goto usage

No need to do it that way.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
11 years agocredit: change default timeslice to 5ms
George Dunlap [Thu, 6 Mar 2014 11:19:39 +0000 (12:19 +0100)]
credit: change default timeslice to 5ms

The 30ms timeslice was chosen nearly a decade ago now, with cpu
"burning" workloads in mind.  In the mean time, processors have gotten
faster and VMEXITs have gotten faster.  A timeslice of 30ms has a
major cost when running latency-sensitive workloads like network or
audio streaming: getting caught behind just one or two other VMs can
introduce a processing delay of up to 60ms, and the "round-robin"
nature of the credit scheduler means this delay may be introduced
every time the VM yields for periods of time.

The XenServer performance team at Citrix have done extensive testing
with various timeslices, including 30ms, 10ms, 5ms, and 2ms.  None of
the workloads exhibited any performance degradation with a 5ms
timeslice.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86/hvm: assert that we we saved a sane number of MSRs.
Tim Deegan [Thu, 27 Feb 2014 15:06:33 +0000 (15:06 +0000)]
x86/hvm: assert that we we saved a sane number of MSRs.

Just as a backstop measure against later changes that add MSRs to the
save function without updating the count in the init function.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agobitmaps/bitops: Clarify tests for small constant size.
Tim Deegan [Thu, 28 Nov 2013 15:40:48 +0000 (15:40 +0000)]
bitmaps/bitops: Clarify tests for small constant size.

No semantic changes, just makes the control flow a bit clearer.

I was looking at this bcause the (-!__builtin_constant_p(x) | x__)
formula is too clever for Coverity, but in fact it always takes me a
minute or two to understand it too. :)

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/mem_sharing: drop unused variable.
Tim Deegan [Thu, 28 Nov 2013 15:02:39 +0000 (15:02 +0000)]
x86/mem_sharing: drop unused variable.

Coverity CID 1087198

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
11 years agox86/shadow: Drop shadow_mode_trap_reads()
Tim Deegan [Thu, 28 Nov 2013 14:59:07 +0000 (14:59 +0000)]
x86/shadow: Drop shadow_mode_trap_reads()

This was never actually implemented, and is confusing coverity.

Coverity CID 1090354

Signed-off-by: Tim Deegan <tim@xen.org>
11 years agocommon/vsprintf: Explicitly treat negative lengths as 'unlimited'
Tim Deegan [Thu, 28 Nov 2013 14:33:06 +0000 (14:33 +0000)]
common/vsprintf: Explicitly treat negative lengths as 'unlimited'

The old code relied on implictly casting negative numbers to size_t
making a very large limit, which was correct but non-obvious.

Coverity CID 1128575

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: identify reset_stack_and_jump() as noreturn
Andrew Cooper [Tue, 4 Mar 2014 10:19:20 +0000 (11:19 +0100)]
x86: identify reset_stack_and_jump() as noreturn

reset_stack_and_jump() is actually a macro, but can effectivly become noreturn
by giving it an unreachable() declaration.

Propagate the 'noreturn-ness' up through the direct and indirect callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agomisc cleanup as a result of the previous patches
Andrew Cooper [Tue, 4 Mar 2014 10:18:28 +0000 (11:18 +0100)]
misc cleanup as a result of the previous patches

This includes:
 * A stale comment in sh_skip_sync()
 * A dead for ever loop in __bug()
 * A prototype for machine_power_off() which unimplemented in any architecture
 * Replacing a for(;;); loop with unreachable()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoidentify panic and reboot/halt functions as noreturn
Andrew Cooper [Tue, 4 Mar 2014 10:17:03 +0000 (11:17 +0100)]
identify panic and reboot/halt functions as noreturn

On an x86 build (GCC Debian 4.7.2-5), this substantially reduces the size of
.text and .init.text sections.

Experimentally, even in a non-debug build, GCC uses `call` rather than `jmp`
so there should be no impact on any stack trace generation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocompiler: replace opencoded __attribute__((noreturn))
Andrew Cooper [Tue, 4 Mar 2014 10:15:47 +0000 (11:15 +0100)]
compiler: replace opencoded __attribute__((noreturn))

Make a formal define for noreturn in compiler.h, and fix up opencoded uses of
__attribute__((noreturn)).  This includes removing redundant uses with
function definitions which have a public declaration.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/crash: fix up declaration of do_nmi_crash()
Andrew Cooper [Tue, 4 Mar 2014 10:14:53 +0000 (11:14 +0100)]
x86/crash: fix up declaration of do_nmi_crash()

... so it can correctly be annotated as noreturn.  Move the declaration of
nmi_crash() to be effectively private in crash.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoinclude: parallelize compat/xlat.h generation
Jan Beulich [Tue, 4 Mar 2014 10:03:13 +0000 (11:03 +0100)]
include: parallelize compat/xlat.h generation

Splitting this up into pieces signficantly speeds up building on multi-
CPU systems when making use of make's -j option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocorrectly use gcc's -x option
Jan Beulich [Tue, 4 Mar 2014 10:01:57 +0000 (11:01 +0100)]
correctly use gcc's -x option

In Linux the improper use was found to cause problems with certain
distributed build environments. Even if not directly affecting us, be
on the safe side.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/ACPI: also print address space for PM1x fields
Jan Beulich [Tue, 4 Mar 2014 10:00:26 +0000 (11:00 +0100)]
x86/ACPI: also print address space for PM1x fields

At least one vendor is in the process of making systems available where
these live in MMIO, not in I/O port space.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/AMD: re-use function wide variables in init_amd()
Jan Beulich [Tue, 4 Mar 2014 09:59:44 +0000 (10:59 +0100)]
x86/AMD: re-use function wide variables in init_amd()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: don't propagate acpi_skip_timer_override do Dom0
Jan Beulich [Tue, 4 Mar 2014 09:58:19 +0000 (10:58 +0100)]
x86: don't propagate acpi_skip_timer_override do Dom0

It's unclear why c/s 4850:923dd9975981 added this - Dom0 isn't
controlling the timer interrupt, and hence has no need to know.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: avoid redundant this_cpu()
Andrew Cooper [Tue, 4 Mar 2014 09:55:56 +0000 (10:55 +0100)]
x86/time: avoid redundant this_cpu()

this_cpu() makes use of RELOC_HIDE() to prevent unsafe optimisations, forcing
a recalculation of the per-cpu data area.  Don't use it needlessly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: cleanup
Jan Beulich [Tue, 4 Mar 2014 09:54:21 +0000 (10:54 +0100)]
x86/time: cleanup

Eliminate effectively unused variables mistakenly left in place by
9539:08aede767c63 ("Rename update_dom_time() to
update_vcpu_system_time()").

Drop the pointless casts.

Use SECONDS() instead of open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoIOMMU: generalize and correct softirq processing during Dom0 device setup
Jan Beulich [Tue, 4 Mar 2014 09:52:20 +0000 (10:52 +0100)]
IOMMU: generalize and correct softirq processing during Dom0 device setup

c/s 21039:95f5a4ce8f24 ("VT-d: reduce default verbosity") having put a
call to process_pending_softirqs() in VT-d's domain_context_mapping()
was wrong in two ways: For one we shouldn't be doing this when setting
up a device during DomU assignment. And then - I didn't check whether
that was the case already back then - we shouldn't call that function
with the pcidevs_lock (or in fact any spin lock) held.

Move the "preemption" into generic code, at once dealing with further
actual (too much output elsewhere - particularly on systems with very
many host bridge like devices - having been observed to still cause the
watchdog to trigger when enabled) and potential (other IOMMU code may
also end up being too verbose) issues.

Do the "preemption" once per device actually being set up when in
verbose mode, and once per bus otherwise.

Note that dropping pcidevs_lock around the process_pending_softirqs()
invocation is specifically not a problem here: We're in an __init
function and aren't racing with potential additions/removals of PCI
devices. Not acquiring the lock in setup_dom0_pci_devices() otoh is not
an option, as there are too many places that assert the lock being
held.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
11 years agomm: ensure useful progress in decrease_reservation
Wei Liu [Fri, 28 Feb 2014 16:35:15 +0000 (17:35 +0100)]
mm: ensure useful progress in decrease_reservation

During my fun time playing with balloon driver I found that hypervisor's
preemption check kept decrease_reservation from doing any useful work
for 32 bit guests, resulting in hanging the guests.

As Andrew suggested, we can force the check to fail for the first
iteration to ensure progress. We did this in d3a55d7d9 "x86/mm: Ensure
useful progress in alloc_l2_table()" already.

After this change I cannot see the hang caused by continuation logic
anymore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoxsm: streamline xsm_default_action()
Jan Beulich [Fri, 28 Feb 2014 16:13:47 +0000 (17:13 +0100)]
xsm: streamline xsm_default_action()

The privileges being strongly ordered is better reflected by using fall
through within the respective switch statement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoxsm: use # printk format modifier
Jan Beulich [Fri, 28 Feb 2014 16:13:05 +0000 (17:13 +0100)]
xsm: use # printk format modifier

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: use xzalloc()
Jan Beulich [Fri, 28 Feb 2014 16:12:13 +0000 (17:12 +0100)]
flask: use xzalloc()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: add compat mode guest support
Jan Beulich [Fri, 28 Feb 2014 16:08:36 +0000 (17:08 +0100)]
flask: add compat mode guest support

... which has been missing since the introduction of the new interface
in the 4.2 development cycle.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Keir Fraser <keir@xen.org>
11 years agovsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair
Jan Beulich [Fri, 28 Feb 2014 16:04:04 +0000 (17:04 +0100)]
vsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair

... in a simplified and consistent way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/p2m: drop second pass looking for shared pages.
Tim Deegan [Wed, 18 Dec 2013 14:12:31 +0000 (14:12 +0000)]
x86/p2m: drop second pass looking for shared pages.

We have run relinquish_shared_pages() already by the time this
teardown happens, and page_make_sharable() exits early if the owning
domain is dying.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
11 years agox86/mm: Don't allow p2m allocation after memory is allocated.
Tim Deegan [Thu, 21 Nov 2013 13:02:34 +0000 (13:02 +0000)]
x86/mm: Don't allow p2m allocation after memory is allocated.

This avoids a potentially long loop populating the p2m table from the
m2p.  Since there's no reason to turn on translate mode after the
domain is already running, this shouldn't be a problem.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agomem_event: Return previous value of CR0/CR3/CR4 on change.
Tamas K Lengyel [Thu, 30 Jan 2014 21:34:16 +0000 (22:34 +0100)]
mem_event: Return previous value of CR0/CR3/CR4 on change.

This patch extends the information returned for CR0/CR3/CR4 register
write events with the previous value of the register. The old value
was already passed to the trap processing function, just never placed
into the returned request. By returning this value, applications
subscribing the CR events obtain additional context about the event.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agons16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips
Aravind Gopalakrishnan [Wed, 26 Feb 2014 16:25:04 +0000 (17:25 +0100)]
ns16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips

Since it is an MMIO device, the code has been modified to accept MMIO based
devices as well. MMIO device settings are populated in the 'uart_config' table.
It also advertises 64 bit BAR. Therefore, code is reworked to account for 64
bit BAR and 64 bit MMIO lengths.

Some more quirks are - the need to shift the register offset by a specific
value and we also need to verify (UART_LSR_THRE && UART_LSR_TEMT) bits before
transmitting data.

While testing, include com1=115200,8n1,pci,0 on the xen cmdline to observe
output on console using SoL.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Thomas Lendacky <Thomas.Lendacky@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/faulting: Use formal defines instead of opencoded bits
Andrew Cooper [Wed, 26 Feb 2014 16:23:47 +0000 (17:23 +0100)]
x86/faulting: Use formal defines instead of opencoded bits

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/cpu: Store extended cpuid level in cpuinfo_x86
Andrew Cooper [Wed, 26 Feb 2014 16:22:30 +0000 (17:22 +0100)]
x86/cpu: Store extended cpuid level in cpuinfo_x86

To save finding it repeatedly with cpuid instructions.  The name
"extended_cpuid_level" is chosen to match Linux.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: Remove redundant RTC REG_B read
Andrew Cooper [Wed, 26 Feb 2014 16:21:22 +0000 (17:21 +0100)]
x86/time: Remove redundant RTC REG_B read

RTC_ALWAYS_BCD is always defined by default, meaning that we will
unconditionally enter the if statement.  Reordering the condition allows
short-circult evaluation to remove a redundant CMOS read.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86: MSR_IA32_BNDCFGS save/restore
Jan Beulich [Tue, 25 Feb 2014 08:41:40 +0000 (09:41 +0100)]
x86: MSR_IA32_BNDCFGS save/restore

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: generic MSRs save/restore
Jan Beulich [Tue, 25 Feb 2014 08:40:31 +0000 (09:40 +0100)]
x86: generic MSRs save/restore

This patch introduces a generic MSRs save/restore mechanism, so that
in the future new MSRs' save/restore could be added w/ smaller change
than the full blown addition of a new save/restore type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: MPX IA32_BNDCFGS msr handle
Xudong Hao [Tue, 25 Feb 2014 08:38:21 +0000 (09:38 +0100)]
x86: MPX IA32_BNDCFGS msr handle

When MPX supported, a new guest-state field for IA32_BNDCFGS
is added to the VMCS. In addition, two new controls are added:
 - a VM-exit control called "clear BNDCFGS"
 - a VM-entry control called "load BNDCFGS."
VM exits always save IA32_BNDCFGS into BNDCFGS field of VMCS.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Unlikely, but in case VMX support is not available, not expose
MPX to hvm guest.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/xsave: enable support for new ISA extensions
Jan Beulich [Tue, 25 Feb 2014 08:34:04 +0000 (09:34 +0100)]
x86/xsave: enable support for new ISA extensions

Intel has released a new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf

This patch adds support for these new instruction set extensions
without enabling this support for guest use, yet.

It also adjusts XCR0 validation, at once fixing the definition of
XSTATE_ALL (which is not supposed to include bit 63).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoxsm: Fix xsm_map_gfmn_foreign prototype when XSM is enabled
Julien Grall [Tue, 25 Feb 2014 08:31:29 +0000 (09:31 +0100)]
xsm: Fix xsm_map_gfmn_foreign prototype when XSM is enabled

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agox86/mce: Reduce boot-time logspam
Andrew Cooper [Tue, 25 Feb 2014 08:30:59 +0000 (09:30 +0100)]
x86/mce: Reduce boot-time logspam

When booting with "no-mce", the user does not need to be told that "MCE
support [was] disabled by bootparam" for each cpu.  Furthermore, a file:line
reference is unnecessary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF
Tim Deegan [Tue, 25 Feb 2014 08:30:21 +0000 (09:30 +0100)]
x86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF

Even in no-ack mode, there's no reason to leave the line asserted
after an explicit ack of the interrupt.

Furthermore, rtc_update_irq() is an unconditional noop having just cleared
REG_C.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/hvm/rtc: inject RTC periodic interupts from the vpt code
Tim Deegan [Tue, 25 Feb 2014 08:29:26 +0000 (09:29 +0100)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code

Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.

This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.

One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode.  However:
 - since all previous RTC models have had this property (including
   the current one, since 'no-ack' mode is hard-coded on) we're
   pretty sure that all guests can handle this; and
 - we're already playing some other interesting games with this
   interrupt in the vpt code.

One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set.  The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
Tim Deegan [Tue, 25 Feb 2014 08:26:45 +0000 (09:26 +0100)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE

If the guest has not asked for interrupts, don't run the vpt timer
to generate them.  This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.

Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled.  This allow a guest to poll for the PF bit
while not asking for actual timer interrupts.  Such a guest would no
longer get the benefit of the vpt's timer modes.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxl: Fix libxl_postfork_child_noexec deadlock etc.
Ian Jackson [Mon, 24 Feb 2014 12:57:53 +0000 (12:57 +0000)]
libxl: Fix libxl_postfork_child_noexec deadlock etc.

libxl_postfork_child_noexec would nestedly reaquire the non-recursive
"no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
The result on Linux is that the process always deadlocks before
returning from this function.

This is used by xl's console child.  So, the ultimate effect is that
xl with pygrub does not manage to connect to the pygrub console.
This behaviour was reported by Michael Young in Xen 4.4.0 RC5.

Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
documented to suffice if called only on one ctx.  So deregistering the
ctx it's called on is not sufficient.  Instead, we need a new approach
which discards the whole sigchld_user list and unconditionally removes
our SIGCHLD handler if we had one.

Prompted by this, clarify the semantics of
libxl_postfork_child_noexec.  Specifically, expand on the meaning of
"quickly" by explaining what operations are not permitted; and
document the fact that the function doesn't reclaim the resources in
the ctxs.

And add a comment in libxl_postfork_child_noexec explaining the
internal concurrency situation.

This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: M A Young <m.a.young@durham.ac.uk>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 5be1e95318147855713709094e6847e3104ae910)

11 years agoiommu: don't need to map dom0 page when the PT is shared
Julien Grall [Mon, 24 Feb 2014 11:33:00 +0000 (12:33 +0100)]
iommu: don't need to map dom0 page when the PT is shared

Currently iommu_init_dom0 is browsing the page list and call map_page callback
on each page.

On both AMD and VTD drivers, the function will directly return if the page
table is shared with the processor. So Xen can safely avoid to run through
the page list.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agovtd: don't export iommu_set_pgd
Julien Grall [Mon, 24 Feb 2014 11:32:00 +0000 (12:32 +0100)]
vtd: don't export iommu_set_pgd

iommu_set_pgd is only used internally in
xen/drivers/passthrough/vtd/iommu.c

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Xiantoa Zhang <xiantao.zhang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Mon, 24 Feb 2014 11:31:28 +0000 (12:31 +0100)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

11 years agovtd: don't export iommu_domain_teardown
Julien Grall [Mon, 24 Feb 2014 11:21:54 +0000 (12:21 +0100)]
vtd: don't export iommu_domain_teardown

iommu_domain_teardown is only used internally in
xen/drivers/passthrough/vtd/iommu.c

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Cambell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxl: comments cleanup on libxl_dm.c
Fabio Fantoni [Sat, 22 Feb 2014 10:35:54 +0000 (11:35 +0100)]
libxl: comments cleanup on libxl_dm.c

Removed some unuseful comments lines.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agox86: expose RDSEED, ADX, and PREFETCHW to dom0
Xudong Hao [Mon, 24 Feb 2014 11:11:53 +0000 (12:11 +0100)]
x86: expose RDSEED, ADX, and PREFETCHW to dom0

This patch explicitly exposes Intel new features to dom0, including
RDSEED and ADX. As for PREFETCHW, it doesn't need explicit exposing.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/MSI: don't risk division by zero
Jan Beulich [Mon, 24 Feb 2014 11:11:01 +0000 (12:11 +0100)]
x86/MSI: don't risk division by zero

The check in question is redundant with the one in the immediately
following if(), where dividing by zero gets carefully avoided.

Spotted-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
11 years agoNested VMX: update nested paging mode on vmexit
Yang Zhang [Mon, 24 Feb 2014 11:09:52 +0000 (12:09 +0100)]
Nested VMX: update nested paging mode on vmexit

Since SVM and VMX use different mechanism to emulate the virtual-vmentry
and virtual-vmexit, it's hard to update the nested paging mode correctly in
common code. So we need to update the nested paging mode in their respective
code path.
SVM already updates the nested paging mode on vmexit. This patch adds the same
logic in VMX side.

Previous discussion is here:
http://lists.xen.org/archives/html/xen-devel/2013-12/msg01759.html

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Christoph Egger <chegger@amazon.de>
11 years agovmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
Aravind Gopalakrishnan [Mon, 24 Feb 2014 11:09:14 +0000 (12:09 +0100)]
vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs

vmce_amd_[rd|wr]msr functions can handle accesses to AMD thresholding
registers. But due to this statement here:
switch ( msr & (MSR_IA32_MC0_CTL | 3) )
we are wrongly masking off top two bits which meant the register
accesses never made it to vmce_amd_* functions.

Corrected this problem by modifying the mask in this patch to allow
AMD thresholding registers to fall to 'default' case which in turn
allows vmce_amd_* functions to handle access to the registers.

While at it, remove some clutter in the vmce_amd* functions. Retained
current policy of returning zero for reads and ignoring writes.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
11 years agox86/MCE: Fix race condition in mctelem_reserve
Frediano Ziglio [Mon, 24 Feb 2014 11:07:41 +0000 (12:07 +0100)]
x86/MCE: Fix race condition in mctelem_reserve

These lines (in mctelem_reserve)

        newhead = oldhead->mcte_next;
        if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) {

are racy. After you read the newhead pointer it can happen that another
flow (thread or recursive invocation) change all the list but set head
with same value. So oldhead is the same as *freelp but you are setting
a new head that could point to whatever element (even already used).

This patch use instead a bit array and atomic bit operations.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
11 years agoQEMU_TAG, QEMU_UPSTREAM_REVISION: Branching
Ian Jackson [Fri, 21 Feb 2014 16:59:45 +0000 (16:59 +0000)]
QEMU_TAG, QEMU_UPSTREAM_REVISION: Branching

QEMU_UPSTREAM_REVISION set back to master, to track the tip.

QEMU_TAG set to the specific changeset as is customary.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agoREADME, xen/Makefile: Branching for 4.5
Ian Jackson [Fri, 21 Feb 2014 16:59:14 +0000 (16:59 +0000)]
README, xen/Makefile: Branching for 4.5

Change version numbers to 4.5-unstable.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agolibxl: Properly declare libxlu_disk_l.h in AUTOINCS
Ian Jackson [Tue, 18 Feb 2014 16:43:42 +0000 (16:43 +0000)]
libxl: Properly declare libxlu_disk_l.h in AUTOINCS

This is necessary so that make doesn't do things which depend on this
file until flex has finished producing it.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Olaf Hering <olaf@aepfle.de>
Tested-by: Olaf Hering <olaf@aepfle.de>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agoxen/arm: Save/restore GICH_VMCR on domain context switch
Julien Grall [Tue, 18 Feb 2014 13:58:21 +0000 (13:58 +0000)]
xen/arm: Save/restore GICH_VMCR on domain context switch

GICH_VMCR register contains alias to important bits of GICV interface such as:
    - priority mask of the CPU
    - EOImode
    - ...

We were safe because Linux guest always use the same value for this bits.
When new guests will handle priority or change EOI mode, VCPU interrupt
management will be in a wrong state.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
11 years agoxen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest
Julien Grall [Tue, 18 Feb 2014 16:56:17 +0000 (16:56 +0000)]
xen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest

The current implementation of raw_copy_guest helper may lead to data corruption
and sometimes Xen crash when the guest virtual address is not aligned to
PAGE_SIZE.

When the total length is higher than a page, the length to read is badly
compute with
    min(len, (unsigned)(PAGE_SIZE - offset))

As the offset is only computed one time per function, if the start address was
not aligned to PAGE_SIZE, we can end up in same iteration:
    - to read accross page boundary => xen crash
    - read the previous page => data corruption

This issue can be resolved by setting offset to 0 at the end of the first
iteration. Indeed, after it, the virtual guest address is always aligned
to PAGE_SIZE.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
[ ijc -- duplicated the comment in the other two functions with this behaviour ]

11 years agoUpdate QEMU_UPSTREAM_REVISION for 4.4.0-rc4
Ian Jackson [Mon, 17 Feb 2014 16:33:48 +0000 (16:33 +0000)]
Update QEMU_UPSTREAM_REVISION for 4.4.0-rc4

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agopvh: Fix regression due to assumption that HVM paths MUST use io-backend device
Mukesh Rathor [Thu, 13 Feb 2014 16:56:39 +0000 (17:56 +0100)]
pvh: Fix regression due to assumption that HVM paths MUST use io-backend device

The commit 09bb434748af9bfe3f7fca4b6eef721a7d5042a4
"Nested VMX: prohibit virtual vmentry/vmexit during IO emulation"
assumes that the HVM paths are only taken by HVM guests. With the PVH
enabled that is no longer the case - which means that we do not have
to have the IO-backend device (QEMU) enabled.

As such, that patch can crash the hypervisor:

Xen call trace:
    [<ffff82d0801ddd9a>] nvmx_switch_guest+0x4d/0x903
    [<ffff82d0801de95b>] vmx_asm_vmexit_handler+0x4b/0xc0

Pagetable walk from 000000000000001e:
  L4[0x000] = 0000000000000000 ffffffffffffffff

****************************************
Panic on CPU 7:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: 000000000000001e
****************************************

as we do not have an io based backend. In the case that the
PVH guest does run an HVM guest inside it - we need to do
further work to suport this - and for now the check will
bail us out.

We also fix spelling mistakes and the sentence structure.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoWhen enabling log dirty mode, it sets all guest's memory to readonly.
Yang Zhang [Thu, 13 Feb 2014 15:50:22 +0000 (15:50 +0000)]
When enabling log dirty mode, it sets all guest's memory to readonly.
And in HAP enabled domain, it modifies all EPT entries to clear write bit
to make sure it is readonly. This will cause problem if VT-d shares page
table with EPT: the device may issue a DMA write request, then VT-d engine
tells it the target memory is readonly and result in VT-d fault.

Currnetly, there are two places will enable log dirty mode: migration and vram
tracking. Migration with device assigned is not allowed, so it is ok. For vram,
it doesn't need to set all memory to readonly. Only track the vram range is enough.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: Don't use __builtin_stdarg_start().
Tim Deegan [Thu, 13 Feb 2014 15:13:07 +0000 (15:13 +0000)]
xen: Don't use __builtin_stdarg_start().

Cset fca49a00 ("netbsd: build fix with gcc 4.5") changed the
definition of va_start() to use __builtin_va_start() rather than
__builtin_stdarg_start() for GCCs >= 4.5, but in fact GCC dropped
__builtin_stdarg_start() before v3.3.

Signed-off-by: Tim Deegan <tim@xen.org>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
11 years agodocs: mention whitespace handling diskspec target= parsing
Olaf Hering [Thu, 13 Feb 2014 14:43:24 +0000 (15:43 +0100)]
docs: mention whitespace handling diskspec target= parsing

disk=[ ' target=/dev/loop0 ' ] will fail to parse because
'/dev/loop ' does not exist.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: stop trying to use the system <stdarg.h> and <stdbool.h>
Tim Deegan [Thu, 13 Feb 2014 12:13:58 +0000 (12:13 +0000)]
xen: stop trying to use the system <stdarg.h> and <stdbool.h>

We already have our own versions of the stdarg/stdbool definitions, for
systems where those headers are installed in /usr/include.

On linux, they're typically installed in compiler-specific paths, but
finding them has proved unreliable.  Drop that and use our own versions
everywhere.

Signed-off-by: Tim Deegan <tim@xen.org>
Tested-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agotools/configure: correct --enable-blktap1 help text
Jan Beulich [Thu, 13 Feb 2014 12:57:43 +0000 (12:57 +0000)]
tools/configure: correct --enable-blktap1 help text

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agodocs/vtpm: fix auto-shutdown reference
Daniel De Graaf [Tue, 11 Feb 2014 15:25:17 +0000 (10:25 -0500)]
docs/vtpm: fix auto-shutdown reference

The automatic shutdown feature of the vTPM was removed because it
interfered with pv-grub measurement support and was also not triggered
if the guest did not use the vTPM. Virtual TPM domains will need to be
shut down or destroyed on guest shutdown via a script or other user
action.

This also fixes an incorrect reference to the vTPM being PV-only.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86/pci: Store VF's memory space displacement in a 64-bit value
Boris Ostrovsky [Thu, 13 Feb 2014 09:49:55 +0000 (10:49 +0100)]
x86/pci: Store VF's memory space displacement in a 64-bit value

VF's memory space offset can be greater than 4GB and therefore needs
to be stored in a 64-bit variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
11 years agoxl: suppress suspend/resume functions on platforms which do not support it.
Ian Campbell [Wed, 12 Feb 2014 14:27:37 +0000 (14:27 +0000)]
xl: suppress suspend/resume functions on platforms which do not support it.

ARM does not (currently) support migration, so stop offering tasty looking
treats like "xl migrate".

Apart from the UI improvement my intention is to use this in osstest to detect
whether to attempt the save/restore/migrate tests.

Other than the additions of the #define/#ifdef there is a tiny bit of code
motion ("dump-core" in the command list and core_dump_domain in the
implementations) which serves to put ifdeffable bits next to each other.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
Andrew Cooper [Wed, 22 Jan 2014 17:47:21 +0000 (17:47 +0000)]
libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()

Avoid freeing info then returning it to the caller.

This is XSA-88.

Coverity-ID: 1056192
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoxen: Drop N from rcN in XEN_EXTRAVERSION
Ian Jackson [Wed, 12 Feb 2014 16:52:26 +0000 (16:52 +0000)]
xen: Drop N from rcN in XEN_EXTRAVERSION

Having this here means we have to wait for a push gate pass, or fart
about which explicit pushes to master, to make an RC.  The boot
messages for git builds already contain the git revision (as a
shorthash).

I will change the tarball creation checklist to seddery the -rc back
to -rcN, along with the other release-management-related changes (like
using an embedded copy of qemu).

If this patch meets with approval it should be thrown into the push
gate today, along with the patch for XSA-88, and then hopefully
nothing much else, so that we can get something suitable for making an
RC from by Friday.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
11 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 12 Feb 2014 12:59:14 +0000 (12:59 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

11 years agoblkif: drop struct blkif_request_segment_aligned
Jan Beulich [Wed, 12 Feb 2014 12:49:11 +0000 (13:49 +0100)]
blkif: drop struct blkif_request_segment_aligned

Commit 5148b7b5 ("blkif: add indirect descriptors interface to public
headers") added this without really explaining why it is needed: The
structure is identical to struct blkif_request_segment apart from the
padding field not being given a name in the pre-existing type. Their
size and alignment - which are what is relevant - are identical as long
as __alignof__(uint32_t) == 4 (which I think we rely upon in various
other places, so we can take as given).

Also correct a few minor glitches in the description, including for it
to no longer assume PAGE_SIZE == 4096.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 years agoxen: arm: correct terminology for cache flush macros
Ian Campbell [Tue, 11 Feb 2014 14:11:04 +0000 (14:11 +0000)]
xen: arm: correct terminology for cache flush macros

The term "flush" is slightly ambiguous. The correct ARM term for for this
operaton is clean, as opposed to clean+invalidate for which we also now have a
function.

This is a pure rename, no functional change.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoRevert "xen: arm: force guest memory accesses to cacheable when MMU is disabled"
Ian Campbell [Tue, 11 Feb 2014 14:11:03 +0000 (14:11 +0000)]
Revert "xen: arm: force guest memory accesses to cacheable when MMU is disabled"

This reverts commit 89eb02c2204a0b42a0aa169f107bc346a3fef802.

This approach has a short coming in that it breaks when a guest enables its
MMU (SCTLR.M, disabling HCR.DC) without enabling caches (SCTLR.C) first/at the
same time. It turns out that FreeBSD does this.

This has now been fixed (yet) another way (third time is the charm!) so remove
this support. The original commit contained some fixes which are still
relevant even with the revert of the bulk of the patch:
 - Correction to HSR_SYSREG_CRN_MASK
 - Rename of HSR_SYSCTL macros to avoid naming clash
 - Definition of some additional cp reg specifications

Since these are still useful they are not reverted.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: clean and invalidate all guest caches by VMID after domain build.
Ian Campbell [Tue, 11 Feb 2014 14:11:02 +0000 (14:11 +0000)]
xen/arm: clean and invalidate all guest caches by VMID after domain build.

Guests are initially started with caches disabled and so we need to make sure
they see consistent data in RAM (requiring a cache clean) but also that they
do not have old stale data suddenly appear in the caches when they enable
their caches (requiring the invalidate).

This can be split into two halves. First we must flush each page as it is
allocated to the guest. It is not sufficient to do the flush at scrub time
since this will miss pages which are ballooned out by the guest (where the
guest must scrub if it cares about not leaking the pagecontent). We need to
clean as well as invalidate to make sure that any scrubbing which has occured
gets committed to real RAM. To achieve this add a new cacheflush_page function,
which is a stub on x86.

Secondly we need to flush anything which the domain builder touches, which we
do via a new domctl.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: keir@xen.org
11 years agoxen: arm: rename p2m next_gfn_to_relinquish to lowest_mapped_gfn
Ian Campbell [Tue, 11 Feb 2014 14:11:01 +0000 (14:11 +0000)]
xen: arm: rename p2m next_gfn_to_relinquish to lowest_mapped_gfn

This has other uses other than during relinquish, so rename it for clarity.

This is a pure rename.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: rename create_p2m_entries to apply_p2m_changes
Ian Campbell [Tue, 11 Feb 2014 14:11:00 +0000 (14:11 +0000)]
xen: arm: rename create_p2m_entries to apply_p2m_changes

This function hasn't been only about creating for quite a while.

This is purely a rename.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: Correctly boot with an initrd and no linux command line
Julien Grall [Mon, 10 Feb 2014 17:34:46 +0000 (17:34 +0000)]
xen/arm: Correctly boot with an initrd and no linux command line

When DOM0 device tree is building, the properties for initrd will
only be added if there is a linux command line. This will result to a panic
later:

(XEN) *** LOADING DOMAIN 0 ***
(XEN) Populate P2M 0x20000000->0x40000000 (1:1 mapping for dom0)
(XEN) Loading kernel from boot module 2
(XEN) Loading zImage from 0000000001000000 to 0000000027c00000-0000000027eafb48
(XEN) Loading dom0 initrd from 0000000002000000 to 0x0000000028200000-0x0000000028c00000
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Cannot fix up "linux,initrd-start" property
(XEN) ****************************************
(XEN)

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxenlight_stubs.c: Allow it to build with ocaml 3.09.3
Don Slutz [Fri, 7 Feb 2014 21:51:51 +0000 (16:51 -0500)]
xenlight_stubs.c: Allow it to build with ocaml 3.09.3

This code was copied from:

http://docs.camlcity.org/docs/godisrc/oasis-ocaml-fd-1.1.1.tar.gz/ocaml-fd-1.1.1/lib/fd_stubs.c

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: David Scott <dave.scott@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: arm: arm64: Fix memory cloberring issues during VFP save restore.
Pranavkumar Sawargaonkar [Fri, 7 Feb 2014 12:57:16 +0000 (18:27 +0530)]
xen: arm: arm64: Fix memory cloberring issues during VFP save restore.

This patch addresses memory cloberring issue mentioed by Julien Grall
with my earlier patch -
Commit Id: 712eb2e04da2cbcd9908f74ebd47c6df60d6d12f

Discussion related to this fix -
http://www.gossamer-threads.com/lists/xen/devel/316247

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoflask: check permissions first thing in flask_security_set_bool()
Jan Beulich [Tue, 11 Feb 2014 10:14:10 +0000 (11:14 +0100)]
flask: check permissions first thing in flask_security_set_bool()

Nothing else should be done if the caller isn't permitted to set
boolean values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: fix error propagation from flask_security_set_bool()
Jan Beulich [Tue, 11 Feb 2014 10:13:22 +0000 (11:13 +0100)]
flask: fix error propagation from flask_security_set_bool()

The function should return an error when flask_security_make_bools()
fails as well as when the input ID is out of range.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: fix memory leaks
Jan Beulich [Tue, 11 Feb 2014 10:11:48 +0000 (11:11 +0100)]
flask: fix memory leaks

Plus, in the case of security_preserve_bools(), prevent double freeing
in the case of security_get_bools() failing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoAMD IOMMU: fail if there is no southbridge IO-APIC
Jan Beulich [Mon, 10 Feb 2014 09:05:24 +0000 (10:05 +0100)]
AMD IOMMU: fail if there is no southbridge IO-APIC

... but interrupt remapping is requested (with per-device remapping
tables). Without it, the timer interrupt is usually not working.

Inspired by Linux'es "iommu/amd: Work around wrong IOAPIC device-id in
IVRS table" (commit c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059) by Joerg
Roedel <joerg.roedel@amd.com>.

Reported-by: Eric Houby <ehouby@yahoo.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Eric Houby <ehouby@yahoo.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
12 years agox86/AMD: apply workaround for AMD F16h erratum 792
Aravind Gopalakrishnan [Fri, 7 Feb 2014 10:12:22 +0000 (11:12 +0100)]
x86/AMD: apply workaround for AMD F16h erratum 792

Workaround for the Erratum will be in BIOSes spun only after
Jan 2014 onwards. But initial production parts shipped in 2013
itself. Since there is a coverage hole, we should carry this fix
in software in case BIOS does not do the right thing or someone
is using old BIOS.

Description:
 Processor does not ensure DRAM scrub read/write sequence is atomic wrt
 accesses to CC6 save state area. Therefore if a concurrent scrub
 read/write access is to same address the entry may appear as if it is
 not written. This quirk applies to Fam16h models 00h-0Fh

See "Revision Guide" for AMD F16h models 00h-0fh, document 51810 rev.
3.04, Nov 2013.

Equivalent Linux patch link:
 http://marc.info/?l=linux-kernel&m=139066012217149&w=2

Tested the patch on Fam16h server platform and it works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Corrected checking for boot CPU. Made warning message conditional.
Compacted warning message text. Moved comment to commit message.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
12 years agolibxl: test programs: Fix make race re libxenlight.so
Ian Jackson [Thu, 6 Feb 2014 19:17:26 +0000 (19:17 +0000)]
libxl: test programs: Fix make race re libxenlight.so

The test programs were getting the proper libxenlight.so on their link
line.  Filter it out.  Also change the soname of the test library to
match the real one, so that libxutil is satisfied with it.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
12 years agolibxl: test programs: Fix Makefile race re headers
Ian Jackson [Thu, 6 Feb 2014 18:41:24 +0000 (18:41 +0000)]
libxl: test programs: Fix Makefile race re headers

We need to include the new TEST_PROG_OBJS and LIBXL_TEST_OBJS in the
appropriate dependencies.  Otherwise we risk trying to build the test
program before gentypes is run.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
12 years agolibvchan: Fix handling of invalid ring buffer indices
Marek Marczykowski-Górecki [Thu, 6 Feb 2014 15:44:41 +0000 (16:44 +0100)]
libvchan: Fix handling of invalid ring buffer indices

The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size.  This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.

Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.

Proof sketch of correctness:

Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.

The raw available bytes functions do unsigned arithmetic on the
returned values.  If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic).  Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).

do_send and do_recv immediately mask the ring index value with the
ring size.  The result is always going to be plausible.  If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer.  I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.

So the security problem is fixed.

This is XSA-86.

(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>